1 

Sparsity Averaging for Compressive Imaging 

Rafael E. Carrillo, Jason D. McEwen, Dimitri Van De Ville, Jean-Philippe Thiran, and Yves Wiaux 



CM ■ 

o; 

<N. 



C/2 



> 
o 
m 
m 

<N 
00 

o 



Abstract — We propose a novel regularization method for sparse 
image reconstruction from compressive measurements. The ap- 
proach relies on the conjecture that natural images exhibit strong 
average sparsity over multiple coherent frames. The associated 
reconstruction algorithm, based on an analysis prior and a 
re weigh ted t\ scheme, is dubbed Sparsity Averaging Re weigh ted 
Analysis (SARA). We test our prior and the associated algorithm 
through extensive numerical simulations for spread spectrum and 
Gaussian acquisition schemes suggested by the recent theory of 
compressed sensing with coherent and redundant dictionaries. 
Our results show that average sparsity outperforms state-of-the- 
art priors that promote sparsity in a single orthonormal basis 
or redundant frame, or that promote gradient sparsity. We also 
illustrate the performance of SARA in the context of Fourier 
imaging, for particular applications in astronomy and medicine. 

Index Terms — Compressed sensing, sparse approximation, im- 
age reconstruction, wavelet representation. 



I. Introduction 

The now famous theory of compressed sensing (CS) intro- 
duces a signal acquisition framework that goes beyond the tra- 
ditional Nyquist sampling paradigm Q-EI. The fundamental 
premise in CS is that certain classes of signals, such as natural 
images, have a concise representation in terms of a sparsity 
basis where most of the coefficients are zero or small, and only 
few are significant. In the CS framework, a signal is sampled 
taking a few linear projections (measurements) onto a set of 
vectors and subsequently recovered using an optimization for- 
mulation that determines the sparsest representation consistent 
with the measurements (H . Compressed sensing acquisition of 
data has an important impact on the design of imaging devices 
where data acquisition is expensive. A single pixel camera that 
acquires random projections from the visual scene through 
a digital micromirror array is detailed in 0. Compressive 
hyper- spectral and optical imagers based on coded aperture 
are also proposed in the literature |6|. Compressive imaging 
strategies have also been proposed in medicine Q-O and 
astronomy ifTOl-HH. 
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Sparse image representation assumes the ability to describe 
signals as linear combinations of a few atoms from a pre- 
defined dictionary. As such, the choice of the dictionary that 
sparsifies the signals is crucial for the success of this model. 
Orthonormal bases, especially wavelet bases, have been very 
popular in imaging linear inverse problems (see (T3l . fPfll and 
references therein). However, there are numerous practical 
examples in which the signal of interest is not sparse in an 
orthonormal basis but rather in terms of an overcomplete 
dictionary. Thus the use of overcomplete dictionaries is 
now widespread in sparse signal and image processing (T5l - 
fT9ll . Other approaches do not fix a pre-defined dictionary 
but instead learn the dictionary to perform best on a training 

set Eq|, ED. 

The standard theory of CS tackles the case of signals sparse 
in the coordinate basis or in some other orthonormal basis, 
but recent works have begun to address the case of redundant 
dictionaries. Rauhut et al. find conditions on the dictionary 
such that the signal can be accurately recovered 1221 . Candes 
et al. extend the CS theory to coherent and redundant dic- 
tionaries, providing theoretical stability guarantees based on 
general conditions of the sensing matrix, coined the Dictionary 
Restricted Isometry Property (D-RIP) E3- The D-RIP is a 
natural extension of the standard RIP in CS l24l . In fact 
many random matrices that obey the standard RIP also obey 
the D-RIP, like Gaussian or Bernoulli ensembles. Also, the 
subsampled Fourier matrix multiplied by a random sign matrix 
satisfies the D-RIP [25], which falls within the framework of 
the recently proposed spread spectrum technique (26]. The 
theory promotes the use of analysis-based priors for signal 
recovery. CS with dictionary learning has also been studied. 
A best basis extension of CS, where an iterative thresholding 
algorithm performs both the recovery of the signal and the 
estimation of the best basis, is proposed in l27l . 

The reweighted i\ minimization scheme was proposed in 
l28l to approximate the £o minimization behaviour. The algo- 
rithm replaces the t\ norm by a weighted l\ norm and solves 
a sequence of weighted £i problems with weights essentially 
the inverse of the values of the solution of the previous 
problem. Empirical results have shown that this approach is 
very effective in reducing the number of measurements needed 
for accurate recovery compared to standard £\ minimization 

El, Ei, E3- 

In the present work, we propose a novel sparsity analysis 
prior for compressive imaging in the context of the CS 
theory with coherent and redundant dictionaries. Our approach 
relies on the conjecture that natural images are simultaneously 
sparse in various frames, in particular wavelet frames, or 
in their gradient, so that promoting average signal sparsity 
over multiple frames is an extremely powerful prior. The 
associated reconstruction algorithm, based on an analysis prior 



and a reweighted t\ scheme, is dubbed Sparsity Averaging 
Reweighted Analysis (SARA). We test SARA through exten- 
sive numerical simulations for spread spectrum and Gaussian 
acquisition schemes. Our results show that the average sparsity 
prior outperforms state-of-the-art priors that promote sparsity 
in a single orthonormal basis or redundant frame, or that 
promote gradient sparsity. We also depart from the theory 
and illustrate the performance of our conjecture and associated 
algorithm in the context of Fourier imaging, for particular 
applications in astronomy and medicine. 

The organization of the remainder of the paper is as follows. 
In Section HH we review the theory of CS for both standard 
orthonormal bases and redundant dictionaries. Section UTT] 
presents the SARA algorithm and practical considerations for 
its implementation. Numerical studies comparing SARA to 
state-of-the-art sparse reconstruction approaches are presented 
in Section [IVJ Finally we conclude in Section [VI with a 
discussion of our results, along with future work. 

II. Compressed Sensing 

In this section, we recall the basics of compressed sensing. 
Firstly, we briefly review the standard compressed sensing 
theory for orthonormal bases. Secondly, we discuss recent 
results in compressed sensing with redundant dictionaries. 
Finally, we review the reweighted i\ minimization method 
for sparse signal recovery. 

A. Compressed sensing with orthonormal bases 

Consider a complex-valued signal x G and also con- 
sider an orthonormal basis W G C NxN , such that x = M/a, 
ol G C^. The signal is said to be sparse if it contains only K 
non-zero coefficients in its decomposition, where K <C N, or 
compressible if its ordered set of coefficients decays rapidly 
and the signal can be well approximated by just the first K 
coefficients. 

CS demonstrates that a signal that is sparse, or compress- 
ible, in some basis can be acquired using a low-rate acquisition 
process that projects the signal onto a small set of vectors [|T|- 
Bl . The signal x is measured through the following model 

y = 0cc -|- n. (1) 

where y G C M denotes the measurement vector, G C MxN , 
M < TV, is the sensing matrix and n G C M represents the 
observation noise. The most common approach to recover x 
from y is to solve the so-called Basis Pursuit denoise problem 
CI, 10: 

rniri^ ||a||i subject to \\y - 0^a|| 2 < e, (2) 

where e is an upper bound on the £2 norm of the noise. Recall 
that the t v norm of a complex-valued vector a G C M is 
defined as ||a|| p = (YlfLi \ a i\ p ) l ^ P i where | • | represents the 
modulus of a complex number. The signal is recovered as 
x = Wct, where d denotes the solution to (|2]). Such problems 
solving for the representation of the signal in a sparsity basis 
are known as synthesis-based problems. The theory shows that 
the Basis Pursuit denoise problem can recover x if the sensing 



matrix obeys certain properties like the Restricted Isometry 
Property (RIP) L24J or the mutual coherence lf30l . A family of 
iterative greedy algorithms are also proposed in the literature 
that offer similar recovery guarantees 13H - 1331 . 

B. Compressed sensing with coherent and redundant dictio- 
naries 

The techniques presented in the previous subsection hold 
for signals that are sparse in the standard coordinate basis 
or sparse with respect to some other orthonormal basis. 
However, there are numerous practical examples in which a 
signal of interest is not sparse in a single orthonormal basis 
but over several orthonormal bases or over an overcomplete 
dictionary lfl6l . |[T8lL |[T9l . In this setting the signal x is now 
expressed in terms of a dictionary ^ G C NxD , N < D, as 
x = Va, ol G C D . Since M <C N < D, the problem is more 
severely underdetermined when solving for the signal in the 
sparsity basis. Rauhut et al. found in l22ll that incoherence 
in the dictionary v|/ is needed such that the compound matrix 
<t>v|/ obeys the RIP to accurately recover a. Recall that he 
coherence of two dictionaries (or one dictionary with itself) 
may be defined as the maximum absolute value of the scalar 
scalar product between their atoms l30l . 

As opposed to synthesis-based problems, which find the 
the sparse representation vector a, analysis-based problems 
recover the signal itself by solving the following problem l34l : 

min ||V t x||i subject to \\y - <t>x\\ 2 < e, (3) 

xeC N 

where t denotes the conjugate transpose operation. In the case 
of orthonormal bases W the two approaches are equivalent. 
However, when ^ is a frame or an overcomplete dictionary, 
the two problems are no longer equivalent. The geometry 
of the two problems are studied by Elad et al. in [34], who 
show that there is a large gap between the two formulations 
due to substantial difference in their geometrical structures. 
Note that the analysis problem does not increase the dimen- 
sionality of the problem when overcomplete dictionaries are 
used. Empirical studies have shown very promising results for 
the analysis approach 1341 . Theoretical properties of the i\ 
analysis regularization for inverse problems and its robustness 
against noise are studied in l35l . 

Candes et al. provide a theoretical analysis of the i\ 
analysis problem coupled with redundant dictionaries in l23l . 
They extend the compressed sensing theory to coherent and 
redundant dictionaries, as opposed to the synthesis results 
in 1221 . providing theoretical stability guarantees based on a 
generalization of the RIP of the sensing matrix 0, the so-called 
D-RIP. These results have been proven for the case of real- 
valued signals only. In the following we define the D-RIP. Let 
W be an overcomplete dictionary. Also, let be the union 
of all subspaces spanned by all subsets of K columns of M/. 
The measurement matrix obeys the D-RIP with constant 

6k e (0, 1) if 

(i-^)||^||i<||^||i< (i + fe)lMH (4) 

holds for all v G H^. 



The D-RIP is a natural extension of the standard RIP. In fact 
many random matrices that obey the standard RIP also obey 
the D-RIP, like Gaussian or Bernoulli ensembles, provided 
that M = O (Klog(D/K)). Also, a result by Krahmer and 
Ward implies that the subsampled Fourier matrix multiplied 
by a random sign matrix satisfies the D-RIP, which provides 
a fast sensing operator [25] . Interestingly, this approach falls 
within the framework of the recently proposed spread spectrum 
technique (26). If O satisfies the D-RIP with S K < 0.08 and V 
is a general frame, Candes et al. prove in ll23l that the solution 
to ® x satisfies the following error bound: 



with G C NxD , N < D. The analysis-based framework is 
a nice approach to promote average sparsity. The prior 
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x\\ 2 < Coe + Ci 



\^x - (i|/ta;) 



K 



(5) 



where (^x)k denotes the best if -term approximation of ^x 
and Co and C\ are numerical constants. 

C Reweighted i\ minimization 

The reweighted i\ minimization algorithm was proposed by 
Candes et al. in [28] to simulate the £o minimization behaviour. 
The algorithm replaces the t\ norm by a weighted t\ norm 
|| Wo? ||i, where W is a diagonal matrix with positive weights. 
The algorithm solves a sequence of weighted £i problems with 
weights essentially the inverse of the values of the solution of 
the previous problem. The idea behind this formulation is 
that large weights will encourage small entries in the solution 
vector, and small weights will encourage larger values in the 
corresponding entries. This scheme has been observed to be 
very effective in reducing the number of measurements needed 
for recovery, and to outperform standard i\ minimization in 
many situations, see e.g. 



III. Sparsity Averaging Reweighted Analysis 

In this section we propose a novel algorithm for compressive 
imaging problems based on the prior of average signal sparsity 
over multiple coherent frames. We start by discussing our 
conjecture of average signal sparsity. Then, we propose the 
reweighted i\ analysis method as a means to promote average 
sparsity and present the resulting algorithm. Finally, we detail 
the optimization methods used to solve the reweighted i\ 
problem in the proposed algorithm. 



A. Average sparsity conjecture and associated analysis prior 

Natural images are often complicated and several types 
of structures can be present at once. It is well known that 
piecewise smooth images exhibit gradient sparsity, and that 
images with extended structures are better encapsulated in 
wavelet frames. Therefore, we here conjecture that promoting 
average sparsity or compressibility over multiple frames rather 
than single frames is an extremely powerful prior. 

We propose using a dictionary composed of a concatenation 
of q frames, i.e. 

v|/ = ^[\|/i,H/ 2 ,...,i|/,], (6) 



*t*||i~-^||*tx||i 

2=1 



(7) 



can indeed be considered, which is a convex relaxation of 
average sparsity 



l^llo-^Ell^llo- 



(8) 



Note that in this setting each frame contains all the sig- 
nal information. Such a prior cannot be formulated in a 
synthesis-based perspective. Previous works considering mul- 
tiple frames, e.g. [!T5ll-lfT71h consider a component separation 
approach, decomposing the signal as x = Y^i=i x ^ wnere 
each component Xi is sparse in the i-th frame. This is a 
completely different problem, where each component bears 
only part of the signal information, which can be address either 
in an analysis or in a synthesis framework. 

Note on a theoretical level that a single signal cannot be 
arbitrarily sparse simultaneously in any pair of incoherent 
frames. For example, a signal extremely sparse in the Dirac 
basis is completely spread in the Fourier basis and thus ([5]) 
does not provide a good error bound. As discussed by Candes 
et al. in l23lh what is important is that the columns of the Gram 
matrix v|/tv|/ are reasonably sparse such that W^x is sparse 
when x admits a sparse representation a such that x = v|/a. 
This requirement is nothing else than a coherence condition 
on ty. In our case of concatenations of frames, this leads to 
the condition that each 4^ is highly coherent with itself and 
with the other frames. 

The concatenation of the first eight orthonormal Daubechies 
wavelet bases (Dbl-Db8, q = 8) represents a natural choice 
in this context. The first Daubechies wavelet basis, Dbl, 
is the Haar wavelet basis and, in particular, it can be used 
as an alternative to gradient sparsity (usually imposed by a 
total variation prior l36l , l37l ) to promote piecewise smooth 
signals. The Db2-Db8 bases are coherent with Haar, in 
particular due to their compact support and identical sampling 
positions, while providing smoother decompositions. Also 
note that the approaches in lfT5l - lfT71l use incoherent frames 
for the decomposition, while our average sparsity prior takes 
the opposite direction. 

B. Constrained reweighted l\ analysis problem 

For what concerns the approach to combine the prior and the 
data, we choose to solve a constrained problem, minimizing 
the priori functional subject the a bound on the residual noise 
power. Such problems offer a strong fidelity term when 
the noise power is known, so that a precise value can be 
estimated for the bound on the residual noise. On the contrary, 
unconstrained minimization problems, using a weighted sum 
of the prior and of the residual noise power would resort to 
an arbitrary weight parameter that can not be evaluated from 
the noise power. These considerations naturally lead us to 
the same minimization problems as those promoted in the 
context of CS with redundant dictionaries, to which we add a 



4 



reweighting scheme promoting £q. The weighted £\ analysis 
problem is defined as 



min HWvl^xHi subject to \y - 4>x|| 2 < e, 



(9) 



where W E M DxZ:) is a diagonal matrix with positive weights. 
Assuming i.i.d. complex Gaussian noise with variance cr n , 
the £2 norm term in © is identical to a bound on the 
X 2 distribution with 2M degrees of freedom governing the 
noise level estimator. Therefore, we set this bound as e 2 = 
(2M+4\/M)cr 2 /2, where a 2 /2 is the variance of both the real 
and imaginary parts of the noise. This choice provides a likely 
bound for ||n||2, since the probability that ||n|| 2 exceeds e 2 is 
the probability that a \ 2 with 2M degrees of freedom exceeds 
its mean, 2M, by at least two times the standard deviation 
2y/~M, which is very small. The solution to © is denoted as 
A(y, 0, W, e), which is a function of the data vector y, the 
measurement and weight matrices and W respectively, and 
the bound e on the noise level estimator. 

In the reweighting approach a sequence of weighted £\ 
problems is solved to approximate the behaviour of the £0 
problem. As an illustration suppose that the sparse signal x is 
known a priori and that we set the weights as = 
In this case the weights are infinite at all locations where 
the signal is zero, forcing the coordinates of the solution 
vector at these locations to be zero. This set of weights 
makes the weighted norm independent of the precise value 
of the non-zero components and guarantees recovery of the 
correct solution assuming only K < M. In practice, the 
original signal is not known in advance but we can compute 
the appropriate weights by solving sequentially weighted £\ 
problems, each using as weights essentially the inverse of the 
values of the solution of the previous problem. 

As it is not possible to have infinite weights where the 
estimated signal values are zero, a stability parameter must 
also be added to the signal value in the selection of the 
weights. In practice, we update the weights at each iteration, 
i.e. after solving a complete weighted £\ problem, by the 
function 



(10) 



7+H' 

where 7 plays the role of a stabilization parameter (ideally 
zero), avoiding undefined weights when the signal value is 
zero. Note that as 7 — >> the weighted £\ norm approaches 
the £0 norm. To approximate the £0 norm by the reweighted 
£\ algorithm, we use a homotopy strategy ll38l and solve a 
sequence of weighted £\ problems using a decreasing sequence 
{7^}, with t denoting the iteration time variable. Under this 
scheme, a weighted £\ problem is first solved and its solution 
is used as the warm start initialization for the next problem 
that is geometrically closer to the £q problem. This process is 
then repeated until the solution becomes stationary 



C. The SARA algorithm 

The resulting algorithm after the previous considerations, 
dubbed sparsity averaging reweighted analysis (SARA), is 
defined in Algorithm [T] Note that in fT"2l some of the 



authors have presented the application of SARA to radio- 
interferometric imaging in astronomy. A rate parameter f3, 
with < /3 < 1, controls the decrease of the sequence 
7 (£) = /37 (£_1) = /3*7o such that 7^ as t 00. 
Ideally, 7^ should decrease to zero, but since we have noise, 
we set a lower bound as 7^ > a a . The standard deviation 
of the noise in the representation domain is computed as 
cr a = ^M/qN a n , which gives a rough estimate for a baseline 
above which significant signal components could be identified. 
As a starting point we set as the solution of the £\ problem 
and 7^ = a s {^\f T x^), where a s (-) stands for the empirical 
standard deviation of the signal, to fix the signal scale. The 
reweighting process ideally stops when the relative variation 
between successive solutions \\x^ — l^/H^ -1 ^ H2 is 

smaller than some bound 77, with < 77 < 1, or after the 
maximum number of iterations allowed, A^ max , is reached. In 
our implementation, which will be detailed in Section IIII-Di 
we fix 77 = 10" 3 and f3 10" 1 . 

Algorithm 1 SARA algorithm 
Input: y, 0, e, a a , f3, 77 and A^x- 
Output: Reconstructed image x. 



Initialize t = 1, = I and p = 1. 
Compute 

£(0) = A(y,*,W(°),e), 



7<°) = a s [W*x 
while p > 77 and t < 7V max do 
Update the weight matrix 



-i) 



with d^ -1 ) = ^ T x^ 
Compute a solution 

»W = A(7/,0,W( £ ),e). 

Update 7^ = max(/37 (t-1) , a a ). 

Update p = \\x^ - ^( £ - 1 )|| 2 /||^( £ - 1 )|| 

t<r- t + 1 

end while 
return x 



fori J = !,...,£> 



D. Solving the weighted £\ problem 

The SARA algorithm needs to solve © at each iteration. 
This convex problem involves the minimization of a non- 
differentiable function, which rules out smooth optimization 
techniques. To solve ©, we use the Douglas-Rachford 
splitting algorithm, tailored to solve problems of the form 



m jn fi(x) + / 2 (x), 



(ID 



where fi(x) and f2(x) are convex lower semicontinuous 
functions from W N to R, with the important property of not 
requiring differentiability of any of the functions [39J. Note 
that any convex constrained problem can be formulated as 
an unconstrained problem by using the indicator function of 
the convex constraint set as one of the functions in (Qj}, i.e. 
fk(x) = ic(&) where C represents the constraint set. Also 
note that complex-valued vectors are treated as real-valued 
vectors with twice the dimension (accounting for real and 
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imaginary parts). The Douglas-Rachford algorithm belongs 
to the family of proximal splitting methods. The key concept 
in these methods is the use of the proximity operator of a 
convex function, which is a natural extension of the notion 
of a projection operator onto a convex set. For example, the 
proximal operator of the i\ norm is the soft-thresolding oper- 
ator, and the proximal operator of the indicator function of a 
constraint is simply the projection operator onto the constraint 
set. We give a formal definition of the proximity operator in 
Appendix El Proximal splitting methods proceed by splitting 
the contribution of the functions j\ and f 2 individually so as 
to yield an easily implementable algorithm. They are called 
proximal because each non- smooth function in (fTTb is involved 
in the minimization via its proximity operator. See [|40l for a 
review of proximal splitting methods and their applications in 
signal and image processing. 

The problem in © can be reformulated as in (fTTt . with 
h(x) = HVW^Hi and f 2 (x) = i Ce ( x )> where C e = {x e 
^ N : \\y — ^11 2 < e}. Since many image processing 
applications deal with intensity images, one may want to add 
a reality and positivity constraint on the estimated signal. 
Then one can define f 2 (x) = ic e (x), where C' e = C e U 
R^. The general form of the Douglas-Rachford algorithm 
is presented in Algorithm [2] The sequence {x^} generated 
by Algorithm [2] has been proved to converge to a solution 
of problem (TTTb (39). The algorithm is stopped when the 
relative variation between the objective function evaluated at 
successive solutions, \fi(x^) — fi(x^ t ~ 1 ^ > ) \ / |/i(a;(* _1 ))|, is 
smaller than some bound £, with < £ < 1, or after the 
maximum number of iterations allowed, M max , is reached. 
v > and < r t < 2 are convergence parameters. In our 
implementation we use the values r t = 1, Vt, £ = 10 -3 and 
v = 10 -1 . We detail the implementation of the proximity 
operators of j\ and f 2 in Appendix lAl 



Algorithm 2 Douglas-Rachford algorithm 

l: Initialize t = 1, x^ G C N ', r t G (0,2), £ G (0,1) and 
v > 0. 

2: while C > £ an d t < M max do 
3: = prox^ 2 (a?W). 

4: x^ = aj« + r t (pro Xl//l (2z« - aj«) - zW). 

5: C=|/l(« (t) )-/l(« (t - 1) )|/|/l(« (t - 1) )|. 
6: t <- t + 1 

7: end while 
8: return x^ 



IV. Experimental Results 

In this section we evaluate the performance of the SARA 
algorithm through numerical simulations. Firstly, we detail the 
experimental set-up, describing the practical implementation 
of SARA and algorithms used as benchmarks. Secondly, we 
report the results of the comparison of SARA to the benchmark 
methods. Finally, we illustrate the application of SARA for 
Fourier imaging in astronomy and medicine. 



A. Experimental set-up 

In light of our discussion in Section IIII-Ai for all experi- 
ments we consider SARA as built from a concatenation of the 
first eight orthonormal Daubechies wavelet bases, Dbl-Db8. 
We use a fourth order decomposition depth for all wavelet 
bases. This choice results from a test over variations on the 
number of bases and decomposition depths (see Section HV-B I) . 

We compare SARA to analogous analysis algorithms, where 
the average sparsity prior is simply substituted for a state- 
of-the-art prior promoting sparsity in a single orthonormal 
basis or redundant frame, or in the gradient. These priors 
are implemented through the very same constrained analysis 
l\ and TV minimization problems (reweighted or not). The 
comparison is performed in terms of reconstruction quality and 
computation time. In practice, the non-reweighted t\ problems 
are defined through ® with three different options for the 
Daubechies 8 wavelet basis, the redundant curvelet frame lTT9l 
and the concatenation of the eight bases described above for 
SARA. The associated algorithms are respectively denoted 
BPDb8, Curvelet and BPSA. The reweighted i x problems 
considered are defined through the reweighting procedure 
described in Section IIII-BI based on ©, for each of the 
three frames considered: the Daubechies 8 wavelet basis, the 
curvelet frame, and the concatenation of Dbl-Db8 leading 
to SARA. The associated algorithms are respectively denoted 
RW-BPDb8, RW-Curvelet and SARA. The TV minimization 
problem is formulated as a constrained problem like ([5]), but 
replacing the i\ norm by the image TV norm. The TV norm 
is defined by the t\ norm of the magnitude of the gradient of 
the signal ||^||tv = ||Vsc||i, where denotes the gradient 
magnitude [36]. The reweighted version of TV, still through 
the procedure defined in Section IIII-BI is denoted as RW-TV. 
Since all the images of interest are positive, we impose the 
additional constraint that x G for all problems. 

We use as reconstruction quality metric the signal to noise 
ratio (SNR), which is defined as: 

SNR = 201og 1Q f I|X| L 2 V (12) 

where x and x denote the the original image and the estimated 
image respectively. The measurements are corrupted by 
complex Gaussian noise in general with a fixed input SNR. 
The input SNR is defined as ISNR = 201og 10 (||i/o||2/H| 2 ), 
where yo identifies the clean measurement vector. 

B. Simulations and results for spread spectrum and Gaussian 
measurements 

In this subsection we evaluate the reconstruction perfor- 
mance of SARA by recovering well-known test images from 
compressive measurements following the model in (Q]). We use 
256x256 pixel versions of Lena and cameraman as test im- 
ages. In order to have a fast measurement operator that obeys 
the D-RIP, we use for a first experiment the spread spectrum 
technique described in [26]. Spread spectrum incorporates a 
modulating sequence in the measurement operator defining it 
as = MFC, where C G IR ArxAr is a diagonal matrix with 
elements with unit norm and randomized sign, F G C NxN 
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Number of bases 

Figure 1. Reconstruction quality as a function of the number of bases in 
the dictionary for decomposition depths L = 1,4,8. Results for Lena with 
M = 0.37V and ISNR set to 30 dB. 

is the discrete Fourier operator and M G R MxN is a binary 
mask defining the random selection operator. 

Prior to our main analysis, we evaluate the reconstruc- 
tion performance of SARA as a function of the number of 
wavelet bases in the dictionary. We test decomposition depths 
L = 1, 4, 8 for all dictionaries. We add bases in parametric 
order, i.e., one basis means Dbl alone, two bases Dbl and Db2 
and so on until we reach the eight bases from Dbl-Db8. The 
results for Lena are summarized in Figure [T] We can observe 
that the best performance is obtained when L = 4 and the 
worst when L = 1. We can also observe that the reconstruction 
quality improves as the number of bases increases until it 
saturates between 4 to 8 bases. This results corroborates our 
choice for 8 bases, and depth L = 4. Results for cameraman 
lead to the same conclusion. 

Having validated the dictionary choice, we now proceed to 
evaluate the reconstruction quality of SARA as a function of 
the undersampling ratio M/N. We vary the undersampling 
ratio from 0.1 to 0.9. The SNR results comparing SARA 
against all the other benchmark methods are shown in Figures 
[2] (a) and [2] (d) for Lena and cameraman respectively. Aver- 
age values over 30 simulations and associated one-standard- 
deviation error bars are reported. The results demonstrate that 
SARA outperforms state-of-the-art methods for all undersam- 
pling ratios. SARA achieves gains between 0.7 and 1.4 dB for 
cameraman and between 0.9 and 1.9 dB for Lena. The largest 
gains are observed for undersampling ratios in the range 0.2- 
0.5 for both images. Notably, BPSA achieves better SNRs 
than BPDb8, curvelet and their reweighted versions for all 
undersampling ratios. It also achieves similar SNRs to TV for 
undersampling ratios in the range 0.4-0.9. 

Computation times (in seconds) as a function of the under- 
sampling ratio M/N are reported in Figures [2] (b) and[2](e), 
for Lena and cameraman respectively. Times are measured on 
a 2.4 GHz Xeon quad core for MATLAB implementations of 
all the algorithms. Again, average values over 30 simulations 
and associated one- standard-deviation error bars are reported. 
Although the concatenation of multiple bases render the algo- 
rithm structure more costly, we see that the computation times 
are of the order of minutes for SARA, being very similar to 
those required for RW-BPDb8 for undersampling ratios larger 
than 0.3. We can also observe that SARA is faster than RW- 



TV and RW-Curvelet while achieving better reconstruction 
quality. Note also that BPSA is faster than TV. 

The next experiment studies the robustness of SARA against 
noise in the measurements. We fix the number of mea- 
surements as M = 0.2N and add Gaussian noise to the 
measurements varying the input noise with ISNR in the range 
to 40 dB. The results are summarized in Figures [2 (c) and 
[2 (f) for Lena and cameraman respectively. Average values 
over 30 simulations and associated one- standard-deviation 
error bars are reported. As expected from the bound in (0), 
the relationship between SNR and ISNR is linear for low 
ISNRs until it is high enough and the reconstruction quality 
is dominated by the undersampling effect. Notably, SARA 
outperforms the benchmark methods in both test images for 
all ISNRs, achieving SNRs of 20 dB for ISNRs of dB. Again, 
BPSA yields a better performance than BPDb8, Curvelet and 
their reweighted versions. 

Next we present a visual assessment of the reconstruc- 
tion quality of SARA compared to the benchmark methods. 
Figure [3] shows the results from Lena and cameraman for 
M = 0.27V and ISNR of 30 dB. The first row show the 
original images. We only show the best image for each prior: 
reweighted or non-reweighted. Thus the results are shown 
for SARA, RW-TV, BPDb8 and Curvelet. SARA achieves, 
for cameraman and Lena respectively, a gain of 1.4 and 1.8 
dB over RW-TV, which is the second best method. Moreover, 
beyond an SNR increase, SARA provides an impressive reduc- 
tion of visual artifacts relative to the other methods in this high 
undersampling regime, as is evident from the reconstructions. 
In particular RW-TV exhibits expected cartoon-like artifacts. 
BPDb8 and Curvelet do not yield results of comparable quality 
relative to SARA, neither in SNR nor visually. 

As final experiment in this subsection, we study the per- 
formance of SARA with Gaussian random matrices as mea- 
surements operators. For this experiment we use a cropped 
version of Lena, around the head, of dimension 128x128 as 
test image (due to computational limitations for the use of 
a dense sensing matrix). We compare SARA against all the 
benchmark methods for this sensing modality. We evaluate 
the reconstruction quality as a function of the undersampling 
ratio M/N in the range 0.1 to 0.9. ISNR is set to 30 dB. The 
SNR results are reported in Figures |U Average values over 30 
simulations and associated one- standard-deviation error bars 
are reported. These results confirm the performance of SARA 
in compressive image reconstruction with a different sensing 
matrix. For M = 0.L/V SARA is 1 dB below TV and RW-TV 
and for M = 0.2N SARA achieves an SNR almost identical 
to RW-TV and TV. For the rest of undersampling ratios SARA 
outperforms the benchmark methods, achieving a gain between 
0.8 and 1.2 dB over RW-TV (second best method). 

C. Fourier imaging illustrations 

Here we illustrate potential applications of SARA in the 
context of Fourier imaging, specifically for aperture synthesis 
for radio-interferometric imaging in radio astronomy PHI and 
magnetic resonance imaging in medicine l42ll . For both 
techniques, the measurement operator can be modeled as 
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Figure 2. Reconstruction quality results for spread spectrum measurements. First column shows SNR results against the undersampling ratio for: (a) Lena 
and (d) cameraman. ISNR set to 30 dB. Second column shows the respective average computation times for: (b) Lena and (e) cameramen. Third column 
shows SNR as a function of ISNR for (c) Lena and (f) cameraman. Undersampling fixed to M = 0.27V. 



= MF, where F G (^nxn ^ ^ discrete Fourier operator 
and M G M M x N is a binary matrix representing the sampling 
profile. We consider measurements generated from random 
variable density sampling profiles. Such profiles allow one to 
account for the fact that most of the signal energy is usually 
concentrated around low frequencies, also mimicking common 
generic sampling patterns in these fields O, MfTOl - lfl2l . See 
ll43l for the exact shape of the density profile. Since we 
consider intensity images, only a half Fourier plane is sampled. 

For this experiment, we compare SARA to essentially the 
same benchmark methods. We however consider a concate- 
nation of nine bases as the dictionary for SARA and BPSA. 
The first basis is the Dirac basis and the eight remaining bases 
are the first eight Daubechies wavelets, Dbl-Db8. The reason 
to add the Dirac basis is that the images of interest are also 
sparse in the spatial domain. This does not break the overall 
coherence condition on W because of the compact support 
of the Daubechies wavelets. Also for these illustrations 
we consider the redundant frame of isotropic undecimated 
wavelets |[T8lL used previously in astronomy fi4l . instead 
of curvelets. The associated non-reweighted and reweighted 
methods are respectively denoted IUWT and RW-IUWT. 

For the radio-interferometric imaging illustration we use 
a cropped version of the radio galaxy Cygnus A R51L of 
dimension 300x526, as test image. We take a very adverse 
undersampling ratio M = 0.05N and set ISNR to 30 dB. 
The first two rows of Figure [5] show the test image and 
the backprojected image, which is defined as F^M T y. The 
SNR of the recovered image for each algorithm is as follows: 



SARA (39.4 dB), BPSA (32.9 dB), RW-TV (24.6 dB), TV 
(28.6 dB), RW-BPDb8 (35.9 dB), BPDb8 (35.5 dB), RW- 
IUWT (25.5 dB) and IUWT (26.3 dB). SARA achieves the 
best SNR followed by RW-BPDb8. The last two rows of 
Figure [5] show reconstructed images for SARA and RW- 
BPDb8. In addition to an SNR gain of 3.5 dB, SARA 
achieves a reconstruction with less visual artifacts, though a 
loss of resolution must also be acknowledged in this extreme 
undersampling regime. A detailed study of the application of 
SARA to radio-interferometric imaging is presented in |[T2l . 

For the magnetic resonance imaging illustration we recon- 
struct a 224x168 brain image for the same adverse under- 
sampling ratio M = O.ObN, which is unprecedented in the 
field. ISNR is set as 30 dB. The first row of Figure [6] show 
the original brain image and the backprojected image. The 
SNR of the recovered image for each algorithm is as follows: 
SARA (18.8 dB), BPSA (17.6 dB), RW-TV (16.9 dB), TV 
(17.3 dB), RW-BPDb8 (15.6 dB), BPDb8 (16.3 dB), RW- 
IUWT (16.2 dB) and IUWT (16.1 dB). SARA achieves the 
best SNR, followed by BPSA and TV. In the second row of 
Figure [6] we show reconstructed images for SARA and TV. 
In addition to an SNR gain of 1.5 dB, SARA achieves an 
impressively better reconstruction from the visual standpoint. 

V. Conclusion and Discussion 

In this paper we have proposed a novel regularization 
method for compressive image reconstruction along with its 
associated algorithm, dubbed Sparsity Averaging Reweighted 
Analysis (SARA). The approach relies on the conjecture that 
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Figure 3. Reconstruction example for Lena (left) and cameraman (right), 
M = 0.2N and ISNR set as 30 dB. Top row: original images. Left column 
from top to bottom, reconstructed images for: SARA (28.1 dB), RW-TV 
(26.3 dB), BPDb8 (21.4 dB) and Curvelet (18.7 dB). Right column from top 
to bottom, reconstructed images for: SARA (31.5 dB), RW-TV (30.1 dB), 
BPDb8 (22.9 dB) and Curvelet (23.1 dB). 
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Figure 4. Reconstruction quality against the undersampling ratio for cropped 
Lena image. Random Gaussian matrix as sensing matrix. ISNR set to 30 dB. 

natural images are simultaneously sparse in multiple frames, 
in particular wavelet frames, or in their gradient, so that pro- 
moting average signal sparsity over multiple coherent frames 
is an extremely powerful prior. The proposed constrained 
reweighted analysis algorithm is evaluated under three dif- 
ferent measurement schemes: spread spectrum and random 
Gaussian measurements, that are within the theory of CS with 
coherent and redundant dictionaries, and Fourier imaging for 
illustrative applications in astronomy and medicine. We have 
implemented SARA using the concatenation of orthonormal 
Daubechies wavelet bases and the Dirac basis. Experimental 
results demonstrate that the sparsity averaging prior embedded 
in SARA outperforms state-of-the-art priors that promote 
sparsity in a single orthonormal basis or redundant frame, or 
that promote gradient sparsity. In virtually all simulations, 
the sparsity averaging conjecture provided significantly better 
reconstruction qualities than any of the benchmark methods, 
both in terms of SNR and visual quality. These results 
appear even stronger when acknowledging the fact that the 
second best performance is not always given by the same 
method (RW-TV or RW-BPDb8), but depends on the simulated 
acquisition scheme and image of interest. 

Note that dictionary learning/optimization has been proven 
very effective in CS (e.g. ll27l ) and highlights a future road 
for SARA. In particular, one may readily consider a rudi- 
mentary evolution of SARA where some of the frames used 
in the initial concatenation, but where the signal estimated 
by a first run of SARA (dictionary learning phase) is not 
sparse, might be removed for a second run of the algorithm 
(reconstruction phase). This suggests an algorithm where we 
estimate iteratively the signal and the best set of frames. 
Also in future work, we will study the possibility to move 
the orthonormal wavelet bases used in our implementation 
to translation invariant dictionaries using the cycle spinning 
method l46l . Future work should also study SARA in imaging 
inverse problems such as denoising, deblurring or inpainting. 

Appendix A 
Proximity Operators 

Let / be a convex lower semicontinuous function from K N 
to R. The proximity operator of / is defined as: 

proxy, (x) = arg min f(z) + h\x - z\\l. (13) 



Figure 6. Brain image example. M = 0.057V and ISNR set as 30 dB. First 
row: Original brain image (left) and the backprojected image (right). Second 
row: reconstruction results for SARA (left, 18.8 dB) and TV (right, 17.3 dB). 




Figure 5. Astronomical image example, M = 0.057V and ISNR set as 30 dB. 
From top to bottom: original image, backprojected image, reconstructed 
image by SARA (39.4 dB) and reconstructed image by RW-BPDb8 (35.9 dB). 
All images are shown in a log 10 scale due to their high dynamic range. 



Again complex- valued vectors are treated as real- valued vec- 
tors with twice the dimension. To compute the proximity 
operator of fi, let us first define the function 

D 

1(a) = ||Wa||i = $>;H, (14) 

2=1 

where uji = (since W is a diagonal positive matrix). The 
function i{a) is the weighted £i norm of a. The proximity 
operator of i{ot) is given by prox^ = jprox^.i.^o^) j < > 
where prox A |.| is the entry- wise soft- thresholding operator 



defined as 

x 

prox AM 0) = — (\x\ - A) + , (15) 

with (■)+ = max(0, •). See [40] and references therein for a 
derivation of this result. The proximity operator of j\ can be 
computed as follows. Let < ji t < 2/||^ t || 2 , where || • || 
denotes the operator norm, and define the recursion 

<u (£+1) = /it (l - prox^-i € ) (V V £) + ^ (16) 

w (t+i) = x _ 

Then — >• prox^ linearly l47l . The proximity operator 
of f2(x) = ic e ( x ) is me projector onto the convex set 
C e = {x e C N : \\y - 4>x\\ 2 < e}. If the reality 
and positivity constraint is active, computing the proximity 
operator of f2(x) = ic e (x), with C' e = C e U R+, is akin to 
solve a feasibility problem. We can decompose J2 as 

f 2 (x)=g(<$>x-y) + h(x), (17) 

where h(z) = i r n(z), g(z) = iB e ( z ) an d B e = {z e C M : 
|| z\\ 2 < e}. Using this decomposition, we can use the dual 
forward-backward algorithm [40] to compute proxj 2 . 
Let < K t < 2/||<l>|| 2 and define the recursion 

= Kt (" " prox K -i g ) (V V t} + $g (t) - 2/) (18) 

= prox^ (a? - *V t+1) ) , 
where prox^(z) = {(^) + } 1 < i < iV and prox^(z) = 
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min(l, e/||z||2)z. Then — >• proxj 2 linearly |40ll . If the 
positivity constraint is not active, prox^ is replaced by the 
identity. In practice we accelerate the recursion in (TT81) using 
a Nesterov-type update ll48l . 
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